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Abstract 

An important trend in the design of storage subsystems is a move toward direct network 
attachment. Network-attached storage offers the opportunity to off-load distributed fde 
system functionality from dedicated File server machines and execute many requests 
directly at the storage devices For this strategy to lead to better performance as perceived 
by users, the response time of distributed operations must improve. In this paper, we 
analyze measurements of an Andrew File System (AFS) server that we recently upgraded 
in an effort to improve client performance in our laboratory. While the original server’s 
overall utilization was only about 3%, we show how burst loads were sufficiently intense 
to lead to periods of poor response time significant enough to trigger customer 
dissatisfaction. In particular, we show how, after adjusting for network load and traffic to 
non-project servers, 50% of the variation in client response time was explained by 
variation in server CPU utilization. That is, clients saw long response times in large part 
because the server was often over-utilized when it was used at all. Using these measures, 
we see that off-loading file server work in a network-attached storage architecture has the 
potential to benefit user response time. Computational power in such a system scales 
directly with storage capacity, so the slowdown during burst periods should be reduced. 


This research is sponsored by DARPA/ITO through ARPA Order D306, and issued by Indian Head 
Division Naval Surface Warfare Center, under contract N00 174-96-0002. The views and conclusions 
contained in this document are those of the authors and should not be interpreted as representing official 
policies, either expressed or implied, of any sponsoring or supporting agency, including the Defense 
Advanced Research Projects Agency and the United States Government. 
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1 . 


Introduction 


Recent trends in the computer industry have greatly increased the demands for common, 
shared information repositories. In most cases, these have taken the form of distributed 
tile systems that are shared across a workgroup, organization-wide, or even world-wide 
A distributed file system, with a number of machines acting as “servers” and a much 
larger number of clients have become popular due to a number of factors, including 
separation of administrative concerns, sharing of data, and transparency [Spasojevic96], 

Advances in other computing technologies have made possible many novel applications 
that are placing increasing demands on distributed storage systems. The delivery of video 
and audio, large-scale parallel applications, and the growth of the Internet have increased 
demands on distributed information systems both in terms of the resources required by 
individual applications and the aggregate demands made by a continually increasing 
number of clients. 



Distributed File Systems 


Figure 1 - Traditional Distributed File System 


At the core of all distributed information systems lies a set of server resources that are 
becoming increasingly loaded as the demands increase. A traditional distributed file 
system model, where storage is simply embodied in the disk and device driver, is 
dlustrated in Figure 1 This picture explains in part why increasing load on distributed 
file systems often requires fast file servers - the file server must traverse two protocol 
stacks for each client request. Data must move from attached disk drives, across the SCSI 
bus, through the server’s memory system, back across the system bus, down the network 
protocol stack and, finally, onto the network wire. The server has very little “interest” in 
the data, yet it must move it through its memory hierarchy - possibly several times - in 
order to satisfy all the protocol layers involved. 

In conjunction with this pressure toward using faster machines as file servers, recent 
years have seen rapid development, both in terms of areal density and in the raw 
bandwidth that can be provided off the platters of fixed storage devices. On top of these 
trends, perhaps the largest change comes from standardizing storage interfaces The 
adoption of the SCSI interface for storage devices allowed storage vendors to optimize 
below a common protocol, and application and file system developers to optimize above 
it. By specifying a separate high-level “logical” interface and a physical interface, SCSI 
made possible numerous optimizations inside disk controllers including RAID 
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transnarent recovery management, dynamic remapping, and storage migration. A 
common interface to operating system software allowed users to buy f lves bas ^ 
price and performance, rather than on compatibility requirements with other parts °^® ,r 
computer Systems This model has led to typical, high-performance distributed file 
systems that* today look more like Figure 2. There is one interconnect for communication 
between clients and servers (IP or IPX over ATM or Ethernet), and another for 
communication between servers and disks (SCSI). 







Client 



Server 

Disks 


Distributed File Systems (2) 

Figure 2 - Actual Distributed File System Architecture Today 

The difficulty with this architecture is that a good portion of the overall system power is 
“d^ipld” m the serve, system that bridges the gap between SCSI and the disputed 
file system protocol used by clients. With relatively slow storage devices and relatively 
[lowVetworfs this additional overhead has until now been hidden among other 
limitations The continued development of disk technology has made possible products 
whh sustainld [ data rates of up to 12 MB/s shipping today and 40 MB s does not look 
unreasonable by the end of the decade. Fibre Channel interconnects also eliminate the 
^aditfonal SCS^ bus as a bottleneck. ATM, Fast Ethernet, and Mynnet provide client 
network rates of 12 MB/s today and 100 MB/s in the near future. These advances mean 
that the amount of room to “hide” inefficiencies in distributed file server implementations 

is shrinking dramatically 

The study described in the rest of this paper examines the requirements placed on file 
server architectures by studying the behavior of current distributed ^ system 
technology Specifically, we have analyzed the system-level behavior of an 
Andrew File System) server in our environment. The following sections will present the 
behavior we have observed and the pressure on file server performance. 

Section 2 provides a brief overview of AFS and presents our measurement methodology 
foo l and environment. Section 3 provides a summary of some of the workload 
characteristics we observed. Section 4 discusses the factors that affect AFS performance 
as perceived by users. Section 5 discusses the potential available through the use o 
network-attached storage devices. Finally, we conclude in Section 6 and discuss avenues 

of future work. 
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2. Experimental Methodology 
2.1. A ndrew File System 

A^H^ arn ^ e ! ,0n (and f L hundreds of other large institutions around the world) the 
Andrew File System is used by nearly all computer users. The major contribution of AFS 

fc^us^on^ observer resounles^ ^ was the 

rHrw 0 a r rd88rTf ,a,i0nS a " d USerS 1? * rela,iv ^ sma " amount 

dadi , cates a P or,IOn °r its local disk s P a “ aa a cache for frequency acceded 
remote data. Data in client caches is kept up-to-date through the use of a strono 
consistency protocol based on callbacks When a client accesses a particu ar file fro 
AFS server, the server marks a callback for that data and client and promiLs to inform 
the client when the data ,s changed. Rather than having a large mKS 

constantly checking ,n at the file server to see if data has changed the respons“bilitv for 
cache invalidation lies with the server . 1 5 ’ res Ponsioility tor 

In the Spring of 1996, our lab upgraded its AFS server in response to our users’ 
complaints about AFS performance. A major motivation in writing this paper is to 

replications ITaFS dfs fT.Tr 1 ^°"' behi " d ‘ he Upgrade and d «c™ine the 
architectures d '«"l>uted file systems built on network-attached storage 

2 . 2 . Measurement Environment 

The measurements reported here were taken from a single file server over the course of a 
two month period at the beginning of 1996. This server contained afi of the proieci 

SP^CstMion 4/60 eS ™th C 24 n MR e p" 3 "' 1 ^ Laboralor 5' < PDL ) The server was a Sun 

models 300, 400 500 and 600 and PCI models 200 and 400), nine IBM KSS 
located in a single laboratory, and fifteen additional machines of varying types ranging 
m power from DECstation 5000s to a SPARCstation 20, in this lab L in the offices of 
students and faculty. The workload, a diverse set of activities one would expect from a 

mClUded S ° ftWare deVd ° Pment ’ d — preparation, 

The School of Computer Science network, to which all these machines are connected 

fo" S the S cen a t? a f W fl °° r ° f ' tS bu,ld,n & with an additional segment 

tral machine room where all AFS servers are housed, all of which are 

connected to a single bridged backbone. The cs.cmu.edu AFS cell in which our 
measurements where taken, consists of 25 (primarily SPARCstation) dedicated servers 
providing home director.es; repositories for shared, locally-maintained software 
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er^$5g£g2££5SS3S£S5 

operating systems with AFS versions 

ranging from locally modified 3.1 to 3.4beta. 

2 . 3 . Analysis Tools 

Trams of file server activity were taken with the aid of a tracing package developed by 
period resulting in over 4 GB of data. 

»r»ia-^35E!Ssl« 

additional 400 MB of raw data. 

To track performance of the network client 

SiS 

clients (one on each floor with client machines) and to the server. 

We developed a set of: ^^suSSSS^sy to pro 
"hTfoUowinr^ctionVwe will provide plots of 

variances, and Pearson r correlation coefficients and \r OTe '" c ^ h °' f a the variation in a 

> he characteristics of underlying system 

factors [Kirk90], 

3. Workload Characteristics 

,n this sect'. r^opmtions at 

nhXdieSom 2£ and the transfer s,ze distributions a, the server. 


1 Due lo Ihe highly distributed nature of So ulai r's^rm^can vre'deKnnine exactly what 

asts^isss — — - w - - ** 

making some variations more difficult to explain. 
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3.1. Client Caching 

goodf [Spasqj evi^96 U Howard 88], ^Tabte ^ extreme| y 

twenty clients for which we have the mos complete daL ^ This da, f “T* ' he 

established fact that there is a high decree of temporal lloJt? d emphasizes the ™»- 
that local disk caching in AFS removes a considerlhl h 7 usel access streams, and 

data shown represents^easurements from a single w<^ tf ir^ The 

of January 29 1996 to Februarv 4 iqq^ Th' ^ t aces - specifically the week 

used throughout the res, o?te pVer week-long period will be 


Average I 2 3 




data 

metadata 


Table 1 - Client Cache Hit Ratio 
3.Z Operation Distribution 

Table 2 shows a breakdown of the most freauentlv iks^H arc , , . 

popularity. The Clients column shows the total for th^n^r ^ and their relative 

course of the same week Note S thfnumLr nf H t T rep0Ited ab ° Ve ° ver the 

match up because this is not a closed system thereT Md r . eq V ests does not 
requests of the PDL server and the PDF ^ 1 ; j 6 were additional clients making 

Will discuss in more detail later) Tte tottl amounl^f d T*, f S Servers <as we 
mb in FetchData requests and 520 MR n ' °™ of(lata transferred by clients was 993 

of 750 MB of data FtfchiData^ancUcceifted^ 3 tota * 


ATS Opom ( ion 


Clients 


total 

748,620 

20,085 

174,717 

46,630 

15,407 

17,242 

0 

50,568 

28,343 

1,101,612 


fraction 
68 0 % 
1 . 8 % 
15.9% 
4.2% 
1.4% 
1 . 6 % 
0.0% 
4.6% 
2.6% 


Table 2 - Distribution of AFS Operations 


total 

412,695 

22,642 

62,288 

32,414 

17,089 

20,422 

244,636 

122,393 

17,298 

951,877 


fraction 

43.4% 

2.4% 

6.5% 

3.4% 

1.8% 

2 . 1 % 

25.7% 

12.9% 

1 . 8 % 
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3.3. Request Sizes 

Table 3 shows the distribution of request sizes over the course of a week. As seen in 
previous studies, small requests dominate the mix, while most of the bytes are moved in 
large requests [Spasojevic96, Baker91], 80% of reads and 65% of writes are for less than 
8 kilobytes However, for StoreData requests, more than two-thirds of the bytes are 
moved at the largest request size. This means that system designers must consider 
optimizations that maximize the bandwidth of the largest requests without adversely 
affecting the latency of the majority of small operations. 







up to 128 bytes 

19,503 

31.3% 

7,607 

23.5% 

1 29 bytes to 1 K 

3,663 

5.9% 

3,196 

9.9% 

1 K to 8 K 

24,858 

39.9% 

10,035 

3 1 .0% 

8 Kto 16 K 

2,127 

3.4% 

2,244 

6.9% 

16 K to 32 K 

1,889 

3.0% 

2,510 

7.8% 

more than 32 K 

10,245 

16.4% 

6,789 

21.0% 

total 

62,285 


32,381 



Table 3 - Distribution of Request Sizes 


4 . Impacts on User-Perceived Performance 
4.1. Server Utilization 

These statistics provide some idea of the typical work being performed by an AFS file 
server but how does the performance of the server figure into customer purchasing and 
system sizing decisions? The Parallel Data Laboratory recently upgraded its AFS server 
from a dedicated SPARCstation 1 to a brand-new dedicated SPARCstation 20 with about 
5 times the rated performance This upgrade was done to a large extent in response to the 
increasingly vocal complaints of slow performance by our users. In fact, little data was 
consulted in the decision to upgrade this server. In an attempt to understand what effect 
the resources available on our server has on user performance, we took a look at the load 
on the original server after the upgrade. Given the traces described above, we can in 
hindsight attempt to better understand how server load relates to file system performance 
and customer satisfaction 

The top chart of Figure 3 shows the fraction of the server CPU spent in the AFS 
fdeserver process over the course of a week, averaged over ten minute intervals. As we 
can see the CPU on the server is mostly idle. Although we do see a number of peak 
periods in which the utilization reaches as high as 65%, the mean CPU utilization is less 
than 3%. This is a disturbing result. Were we wrong to spend about $10 000 for a new, 
fast file server to replace a slow, inexpensive server that is only 3% utilized 
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A similar effect is seen in the plot of disk activity in the lower chart of Figure 3. This 
chart shows the total number of physical disk accesses completed in each of the same 10 
minute intervals It is harder to talk about percentage utilization in this case but the three 
drives on this server should be able to sustain considerably more than the 50 000 
accesses/hour (14 accesses/second) that correspond to the highest point on the chart The 

foad^ 86 IS 6SS ° ne access/second over three disks Again, a negligible total average 


Simply looking at these numbers, we might be tempted to conclude that this five year old 
machine is performing adequately and there is no need for an upgrade at all 3 So how do 
we explain our users’ complaints? We clearly needed some other measure that we could 
use to gauge users’ perception of the performance of the system. Since overall utilization 

enlSening 13 ^ Cm ’ ^ SUrm ' Sed that looking at res P onse time might prove more 


"/£’ h , C ( , Upgrad . e pollcy at *f ge AFS sltes 1S rumored to be generally insensitive to utilization as well 
a gorithm used can roughly be paraphrased as, when customers complain, begin with the oldest 
omponent of the system and continue to replace equipment with newer models until complaints subside 
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4.2. Client Response Time 

The client data that we collected provided hourly samples of the number and total elapsed 
time of all APS operations of each type completed by that client in that hour, e c ose 
use the average response time for FetchStatus operations as our measure of user-visible 
performance because 1) it is the most frequently-called operation, 2) in the absence o 
outside influences, it does an approximately constant amount of work on each call (since 
data fetches in AFS may be as large as several hundred kilobytes, but most files are much 
smal ler th^n this, FetchData delays are expected to be much more variable) and 3) we 
found an r 2 coefficient of determination suggesting that 50% of * e vanatlon 'J 1 
response times of FetchStatus and the per-kilobyte latencies of FetchData are correlated, 

as shown in Figure 4. 

If we anain look at the average response time in Figure 4, we see significant variation - 
ranging over an order of magnitude. We hypothesize that users of AFS, accustomed to 

local disk access times (due to high local cache hit ratios ^ s "' b ' d ^^ e > he W ^ e b ' 
significantly affected by high variance in response times, particularly when the et 
lasts for significant lengths of time, such as the hourly intervals shown in this ^ chart 
Based on this, we began searching for the causes of high variance in user response 

Comparison of Average Res ponse Time for FetchStatus and FetchData 
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Figure 4 - FetchStatus and FetchData Performance 


In order to convince ourselves that our AFS server upgrade had indeed been worthwhile 
we performed In experiment to compare the performance of our old server and our new 
server under the same workload. The numbers in Table 4 show the results of this 
controlled experiment. One test client was constantly performing stat 0 calls 
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same server. Both clients Hushed .heir caches a the end of a ^vde ’soThJT °," ' he 
were handled at the server. The table shows the average response time of ?he FST 

~ '“ d fr ° m , ,he Stat " CaMS - 

com£tag process meaSUr ' nS and ,he avera « e «h™ughpu, of the 


Machine 


SPARCstation 1 + 
SPARCstation 20 


SPECint92 


Average 
I etchStalus 
Res no n sc Tiiii^ 


Number of 
Operations 


8,486 

15,291 


( ompeting 

Ueatl Transfer 


212.7 

343.8 


Table 4 - Direct Comparison of Server Platforms 

From this experiment, we see that the increased CPU performance of the newer machine 
re uces average Fetch Status response time by 35% at periods of high server load At the 

ir^the faSt6r ma, T hine can complete almost twice as many FetchStatus operations 

e same time interval while also providing 62% higher data throughput Since more 
erver processing power is clearly effective for improving client performance^ w e Expect 

‘n ourtace dm" ‘ P be,ween server Cpu utilization and client response time 

4.3. Impact of the Network 

When we first compared the CPU and disk utilization trace to the FetchStatus response 
ime trace, we were unable to find a significant correlation between times of slow user 
response and times of high server utilization. This unintuitive result led us to look for 
other factors that might explain performance at the clients. The most obvious factor in a 

tm,td„exf S,em ,S ne ‘ WOrk betWee " mach '" eS ' so this is «* Parameter we 

° f Fig c ur l f 5 shows the avera ge network round-trip time of pings on the lab 
and machine room Ethernet segments over one hour periods. We see a mean of 9 0 Z 

and a standard deviation of 7 2 ms on the server network, and 16.9 ms 15 8 ms on the lab 
segment, where most of the clients were located. The lower left portion of Figure 5 shows 
the graphical correlation between the response time of the network and^etchStatus 

suggests th!u 3 5^ oHhe^ ^ Hne<r relatio " shi P- the ^arso n J coeff'ctem 
suggests that 35 /o of the variation in the response times can be attributed to variation in 

riah^ F - erf T anCe T ° f°T ° n th ' S reIationsh 'P- the correlation graph in the lower 
In ^f r FlgUre 5 reports only those hours where average ping time was larger than 20 ms 
n this figure, a linear relationship between server response time and network response 
time is more plausible. This matches our expectations that the network connecting the 
machines in a distributed system is a considerable factor in overall performance It is for 

ATM aS n n ? th ! T Se 7 er and many ° f ° Ur Clients are bei "g outfitted whh sw tched 
ATM networking dedicated to the PDL in addition to the existing Ethernet. However we 

d Direcdy collated data, with 100% of the vanation explained, would appear as a s, might line on these 
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also see that network response time 
variance. 
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is not a complete explanation of client response time 
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- Correlation of Response Time with Network Behavtor 


Our ne^^step^wasTo^ga^^^^f®^^^^ Eliminated* from^hTresponse 

the remaining variance as we e * P ^ C * id J ed in our initial analysis. Although all of 

we did notice an effect that we had no server we were tracing, home 

the project volumes for .*!“ ^eTwe^e being accessed on servers shared across the 
directories and shared binaries we ® operations performed in hour-long 

department.' Since we were looktng "^.?™?^ S si ^ flca „ t impact on user response 
intervals, load on these shared sav between clients of the same system type, 

time We see a significant r' °™ 5 f* “ ‘* 2 client’s response time trace is 
suggesting that about 65 /o ° f 1 " ™ a 'J res ponse tfme trace of machines of the same 

th^response time seen at millburn (an RS/6000) and 

i'stcm SS accessary SLA panic, pare. 

381 





SnrSS between m i„ burn 

mistake was to overestimate The TS" ! «r 
binaries and underestimate the frequency with which uLThoml d" of , COmmonl >' “*=*1 
the course of project work. Although most of the user h!?! h T d,rectones are used in 
binaries and home directories stored on shared s^ow st be , stored on a fast server, 

on user-visible performance 6 ’ ° W Servers ma y be a considerable drag 
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igure 6 - Correlation of Response Time by Client System Type 
4.5. Impact of Server Utilization 

tracing^we fij tered* the fxmse^i^ e* 'data to hicl^d ^ TT other . than the one we are 
active on our server 7 Figure 7 shows thp U ^ tbose P er *ods when a host was 

much of the response time i, m>t c^S~!t” £ 

used shared files on oufup"rad”d ^TOnoImprOTe ml" (mralTneri ,e '’ d "‘' illy repl ' cal ' on si '“ of die most- 
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pvtrart the delays associated with central AFS servers, we expect some amount of 
uncorrelated | poi nts Neglecting data points with less than 500 disk accesses per hour m 
the center pi ot we see an r 2 correlation of 25%, as response times are impacted by the 
1 nnnt nf P Hi sk work (dominated by FetchData operations) the server is already 
processing when new requests arrive. In the rightmost correlation plot, we see an even 
closer correlation with CPU utilization (for the same set of points as in the center P 
whS ihedtks are busy) wh.ch explains JU s, Csy 

(rfter'network 'and accounted for). This result fits well 

w,,h our prior and d 

n^f ^b^these Tumber'sdale wi.S .he amount of data berng nroved 
[Gibson96], 



ddAKusetd^ 

being transferred, during which server load leads to poor client response tunes. 
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5. Network-Attached Storage 

5,1 ‘ Opportunities for Network-Attached Storage 
Recalling Figure 2, which shows how the distributed file server mo w 

piasiliisl lip 

s,"i: 'S.™Tsr ; -r~ “S 



NASD 


Network-Attached Storage 

Figure 8 - Network-Attached Storage Architecture 


There is a range of possible configurations for such a system At one end nf *u 

transfer mterface to instruct drives to transfer data dlrectTy to ctats At ^ 

the spectrum, dedicated Network File System fNFS^ nr Net 6 en< ^ ° f 

ct ’ C ^ ec * specially optimized hardware configurations Network attached 

management, but drives would have sufficient intelligence to handle , ^ 

desired ^calability'aad^ pe^mance ^' h may als^b^ 

inquiry functions handled at the drives [Gibson96], ^ status d 
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This direct transfer concept is not a new one. In 1991, Randy Katz described the basic 
advances that make network-attached devices feasible [Katz91], The High Performance 
Storage Systems project [Watson95] is exploring these technologies in the context of 
large MPP and SMP systems based on the framework of the Mass Storage Systems 
Reference Model [Miller88], Van Meter provides a survey of current products and major 
research issues, including security, network protocols, and the changes in operating 
system paradigms necessary to efficiently support network-attached devices [Van 

Meter96], 

Such an architecture raises several important issues. Can the drive be made sufficiently 
intelligent at a reasonable cost? How do we ensure the security and integrity of the data 
being stored? Can enough of the server functionality be off-loaded to significantly 
improve both throughput and scalability? How effective will this architecture be tor 
meeting the needs of the clients in a distributed system? 

5.2. Implications of this Study for Network- Attached Storage 
The biggest lesson that we take away from the preceding analysis is that the mean 
behavior of the system is essentially irrelevant. Even though the system is 97 /o idlewhe 
measured in total, it is the high load periods that matter to customer satisfaction. As 1 able 
5 shows peak loads, even at the granularity of an hour, are much higher than average 
loads. Moreover, the distribution of operations measured over the long term, shown on 
the left of Table 5 and similar to previous studies [Spasojevic96] is not preserved in these 
peak periods - data activity is nearly twice as common in these peaks With customer 
satisfaction sensitive to response time variation, the server performance during peak loads 
is likely to be more important than at other times. 


Derations 


Fetch Status 

StoreStatus 

FetchData 

StoreData 

CreateFile 

RemoveFile 

GiveUpCallbacks 


Weekly Total 


total 
412,695 
22,642 
62,288 
32,414 
17,089 
20,422 
17,298 
584,848 


Peak Hour 


fraction 

hourly 

total 

fraction 

70.6% 

1,247 

6,209 

45.3% 

3.9% 

134 

175 

1.3% 

10.7% 

370 

4,219 

30.8% 

5.5% 

192 

147 

1.1% 

2.9% 

101 

52 

0.4% 

3.5% 

122 

2,587 

18.9% 

3.0% 

103 

326 

2.4% 


2,269 

13,715 



Table 5 - Distribution of Server Operations 

Given a high emphasis on the server performance during peak loads, off-loading the 
high-cost data movement operations, as proposed by the network-attached storag^ 
architecture, should decrease the variance in user response time significantly, even 
though overall averages will simply be reduced from a small number to an even smaller 
number The appropriate analogy is no. to system throughput, butt something 
way reliability is measured. Changing the mean time to data loss (MTTDL) of a system 
from 10 years to 100 years does not mean that one expects the system to last ten times as 
long but that the probability of a failure occurring within the next hour is reduced by an 
order of magnitude. We suggest that there is an analogous measure for distributed fde 
systems the mean time until burst (bad) performance (MTTBP) which should be 
increased so that the probability of poor response times in any given hour of work is 
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decreased. We would expect users to be pleased if the occurrence of a period of bad 
response time were reduced from once a week to once every 3 months. 

6. Conclusions 

Modern distributed file systems such as AFS very successfully cache file data on client 
machines. While this ensures that average response time is low, it also ensures large 
variance in response time because operations that must contact remote servers are much 
he°mTt direct measurement of these remote servers show that their overall utilization can 
be quite low, 3 /o in our data, while users are simultaneously sufficiently dissatisfied with 

PC t0 P ^ y , for ? faster server T h.s study shows that the faster server is in fact 

needed because although 97% .die overall, these file servers can be intensely overloaded 

tgltkusers g t0 Pen ° dS ° f P °° r reSP ° nSe time l0n S enou g h to 

In addition to focusing our attention on burst server loads, our analysis shows that the 
distribution of operation types during bursts is different from overall distributions 

LX“Lr m,Z6d WOrkl ° ads Wlth much more data transfer th an the overall 

These results confirm our intuition that network-attached storage, if it can re-route most 
data transfer directly to storage devices, has the potential to reduce customer response 

Tf * WO ™ ays ' [t avo,ds the co Pymg steps at the server and 2) it off-loads the work 
of data transfer from the server, reducing the chance of a bust of overutilization. 

Out future work, then, is to evaluate the client performance on such network-attached 
storage architectures and demonstrate the implications on distributed file system design 
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